Task 1

In [1]:
# Importing the data file and creating a pandas df
from google.colab import files
uploaded = files.upload()
Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.
Saving keyword_data.csv to keyword_data.csv
In [2]:
import pandas as pd
import numpy as np

keyword_data = pd.read_csv('keyword_data.csv')
keyword_data.shape
Out[2]:
(66, 13)
In [3]:
# Previewing data
keyword_data.head()
Out[3]:
Title Keyword 1 Keyword 2 Keyword 3 Keyword 4 Keyword 5 Keyword 6 Keyword 7 Keyword 8 Keyword 9 Keyword 10 Keyword 11 Keyword 12
0 Feb-03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 Meta-Analyses of Financial Performance and Equ... EQUITY ORGANIZATIONAL sociology PERFORMANCE META-analysis PSYCHOMETRICS ORGANIZATIONAL research FINANCIAL performance AGENCY theory ORGANIZATIONAL effectiveness ORGANIZATIONAL behavior CORPORATE governance NaN
3 Home Country Environments, Corporate Diversifi... DIVERSIFICATION in industry BUSINESS planning PERFORMANCE standards EMPLOYEES -- Rating of CORPORATE culture STRATEGIC planning ORGANIZATIONAL effectiveness MANAGEMENT science MANAGEMENT research PRODUCT management NaN NaN
4 Safeguarding Investments in Asymmetric Interor... INTERORGANIZATIONAL relations INTERGROUP relations BUSINESS communication INVESTMENTS SUPPLY chains KNOWLEDGE management INTERORGANIZATIONAL networks CORPORATE governance GROUP decision making INTELLECTUAL capital NaN NaN

Cleaning the dataset

In [4]:
### Filtering out empty rows

keyword_data = keyword_data[keyword_data['Title'].notnull()]
keyword_data.shape
Out[4]:
(55, 13)
In [5]:
# Filtering out rows that have date values in Title
keyword_data['all_keywords'] = keyword_data['Keyword 1'].fillna('') + keyword_data['Keyword 2'].fillna('') + keyword_data['Keyword 3'].fillna('') + keyword_data['Keyword 4'].fillna('') + keyword_data['Keyword 5'].fillna('') + keyword_data['Keyword 6'].fillna('') + keyword_data['Keyword 7'].fillna('') + keyword_data['Keyword 8'].fillna('') + keyword_data['Keyword 9'].fillna('') + keyword_data['Keyword 10'].fillna('') + keyword_data['Keyword 11'].fillna('') + keyword_data['Keyword 12'].fillna('')
<ipython-input-5-9fcc9b90e201>:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  keyword_data['all_keywords'] = keyword_data['Keyword 1'].fillna('') + keyword_data['Keyword 2'].fillna('') + keyword_data['Keyword 3'].fillna('') + keyword_data['Keyword 4'].fillna('') + keyword_data['Keyword 5'].fillna('') + keyword_data['Keyword 6'].fillna('') + keyword_data['Keyword 7'].fillna('') + keyword_data['Keyword 8'].fillna('') + keyword_data['Keyword 9'].fillna('') + keyword_data['Keyword 10'].fillna('') + keyword_data['Keyword 11'].fillna('') + keyword_data['Keyword 12'].fillna('')
In [6]:
keyword_data = keyword_data[keyword_data['all_keywords']!='']
keyword_data = keyword_data.iloc[:,:-1].reset_index(drop=True)
keyword_data.head()
Out[6]:
Title Keyword 1 Keyword 2 Keyword 3 Keyword 4 Keyword 5 Keyword 6 Keyword 7 Keyword 8 Keyword 9 Keyword 10 Keyword 11 Keyword 12
0 Meta-Analyses of Financial Performance and Equ... EQUITY ORGANIZATIONAL sociology PERFORMANCE META-analysis PSYCHOMETRICS ORGANIZATIONAL research FINANCIAL performance AGENCY theory ORGANIZATIONAL effectiveness ORGANIZATIONAL behavior CORPORATE governance NaN
1 Home Country Environments, Corporate Diversifi... DIVERSIFICATION in industry BUSINESS planning PERFORMANCE standards EMPLOYEES -- Rating of CORPORATE culture STRATEGIC planning ORGANIZATIONAL effectiveness MANAGEMENT science MANAGEMENT research PRODUCT management NaN NaN
2 Safeguarding Investments in Asymmetric Interor... INTERORGANIZATIONAL relations INTERGROUP relations BUSINESS communication INVESTMENTS SUPPLY chains KNOWLEDGE management INTERORGANIZATIONAL networks CORPORATE governance GROUP decision making INTELLECTUAL capital NaN NaN
3 Managerialist and Human Capital Explanations f... EXECUTIVE compensation WAGES HUMAN capital LABOR economics PERSONNEL management MANAGEMENT science CONTINGENCY theory (Management) COMPENSATION management EXECUTIVE ability (Management) CORPORATE governance NaN NaN
4 Bidding Wars Over R&D-Intensive Firms: Knowled... KNOWLEDGE management INFORMATION resources management MANAGEMENT information systems BREAK-even analysis DATA mining MANAGEMENT science RESEARCH & development RESEARCH & development contracts CORPORATE governance DECISION making ORGANIZATIONAL behavior TRANSACTION costs
In [7]:
### Changing all keyword columns to string datatype
for i in range(1, keyword_data.shape[1]):
  keyword_data[f'Keyword {i}']=keyword_data[f'Keyword {i}'].astype('string')

keyword_data.dtypes
Out[7]:
Title         object
Keyword 1     string
Keyword 2     string
Keyword 3     string
Keyword 4     string
Keyword 5     string
Keyword 6     string
Keyword 7     string
Keyword 8     string
Keyword 9     string
Keyword 10    string
Keyword 11    string
Keyword 12    string
dtype: object

Extracting keywords and creating a weighted adjacency matrix

In [8]:
df=keyword_data
# Replacing nulls with 'NA'
df = df.fillna('NA')
df.head()
Out[8]:
Title Keyword 1 Keyword 2 Keyword 3 Keyword 4 Keyword 5 Keyword 6 Keyword 7 Keyword 8 Keyword 9 Keyword 10 Keyword 11 Keyword 12
0 Meta-Analyses of Financial Performance and Equ... EQUITY ORGANIZATIONAL sociology PERFORMANCE META-analysis PSYCHOMETRICS ORGANIZATIONAL research FINANCIAL performance AGENCY theory ORGANIZATIONAL effectiveness ORGANIZATIONAL behavior CORPORATE governance NA
1 Home Country Environments, Corporate Diversifi... DIVERSIFICATION in industry BUSINESS planning PERFORMANCE standards EMPLOYEES -- Rating of CORPORATE culture STRATEGIC planning ORGANIZATIONAL effectiveness MANAGEMENT science MANAGEMENT research PRODUCT management NA NA
2 Safeguarding Investments in Asymmetric Interor... INTERORGANIZATIONAL relations INTERGROUP relations BUSINESS communication INVESTMENTS SUPPLY chains KNOWLEDGE management INTERORGANIZATIONAL networks CORPORATE governance GROUP decision making INTELLECTUAL capital NA NA
3 Managerialist and Human Capital Explanations f... EXECUTIVE compensation WAGES HUMAN capital LABOR economics PERSONNEL management MANAGEMENT science CONTINGENCY theory (Management) COMPENSATION management EXECUTIVE ability (Management) CORPORATE governance NA NA
4 Bidding Wars Over R&D-Intensive Firms: Knowled... KNOWLEDGE management INFORMATION resources management MANAGEMENT information systems BREAK-even analysis DATA mining MANAGEMENT science RESEARCH & development RESEARCH & development contracts CORPORATE governance DECISION making ORGANIZATIONAL behavior TRANSACTION costs
In [9]:
### Converting from wide to long format to get the list of distinct keywords
melted_df = pd.melt(df, 
                    id_vars=['Title'],
                    value_vars=['Keyword 1','Keyword 2','Keyword 3', 'Keyword 4','Keyword 5','Keyword 6','Keyword 7','Keyword 8','Keyword 9','Keyword 10','Keyword 11','Keyword 12'],
                    var_name='keyword_number',
                    value_name='keyword')

keywords = list(melted_df['keyword'].unique())

### Removing 'NA' from the list of keywords
keywords.remove('NA')
print(keywords)
['EQUITY', 'DIVERSIFICATION in industry', 'INTERORGANIZATIONAL relations', 'EXECUTIVE compensation', 'KNOWLEDGE management', 'EMOTIONS (Psychology)', 'SUPERVISORS', 'INDUSTRIAL relations', 'DECISION making', 'CORPORATE governance', 'EXECUTIVES', 'FAMILY-owned business enterprises', 'INSTITUTIONAL investors', 'RESEARCH & development', 'PROPERTY', 'STOCK options', 'MANAGEMENT science', 'AGGRESSION (Psychology)', 'CHIEF executive officers', 'MENTAL fatigue', 'PERSONNEL management', 'PRODUCT management', 'SOCIAL capital (Sociology)', 'ORGANIZATIONAL behavior', 'NEW products', 'LEADERSHIP', 'TEAMS in the workplace', 'LABOR supply', 'EMPLOYEES -- Attitudes', 'WORK & family', 'HUMAN capital', 'SOCIAL status', 'EMPLOYEE motivation', 'ORGANIZATIONAL change', 'CREATIVE ability', 'GOING public (Securities)', 'INTERNATIONAL business enterprises -- Management', 'COMPENSATION management', 'CROSS-functional teams', 'SERVICE industries -- Management', 'ORGANIZATIONAL sociology', 'BUSINESS planning', 'INTERGROUP relations', 'WAGES', 'INFORMATION resources management', 'INTERPERSONAL relations', 'JUSTICE', 'INDUSTRIAL management', 'STOCKHOLDERS wealth', 'DEBT', 'INVESTMENTS', 'PERFORMANCE', 'STOCKS (Finance)', 'INDUSTRIAL organization', 'VIOLENCE', 'PERSONNEL changes', 'JOB stress', 'INFRASTRUCTURE (Economics)', 'MULTILEVEL marketing', 'PERFORMANCE evaluation', 'LABOR organizing', 'GENEROSITY', 'EXECUTIVE ability (Management)', 'JOB performance', 'EMPLOYEE rules', 'TAIWANESE', 'CORPORATE image', 'CORPORATIONS -- Finance', 'FOREIGN subsidiaries -- Management', 'COMPETITIVE advantage', 'CUSTOMER relations', 'PERFORMANCE standards', 'BUSINESS communication', 'MANAGEMENT information systems', 'STRESS (Psychology)', 'CONFLICT management', 'DECISION theory', 'STOCK repurchasing', 'DIRECTORS of corporations', 'STOCKHOLDERS', 'SCREENWRITERS', 'SOCIAL psychology', 'SUCCESSION planning', 'INDUSTRIAL psychology', 'PROBLEM solving', 'VENTURE capital', 'ORGANIZATIONAL commitment', 'COMMERCIAL products', 'STRATEGIC planning', 'CRITICAL thinking', 'CONDUCT of life', 'CAPITAL investments', 'BEHAVIORAL research', 'HUMAN error', 'EMPLOYEES', 'STOCKHOLDERS -- Attitudes', 'INCENTIVES in industry', 'CREATIVE ability in business', 'EMPLOYEE selection', 'BUSINESS networks', 'GROUP identity', 'META-analysis', 'EMPLOYEES -- Rating of', 'LABOR economics', 'BREAK-even analysis', 'SOCIAL interaction', 'MEDIATION', 'AGENCY theory', 'GLOBALIZATION', 'BUSINESS enterprises', 'PROFIT', 'STOCK ownership', 'ORGANIZATIONAL effectiveness', 'ORGANIZATIONAL justice', 'EXECUTIVE succession', 'BURNOUT (Psychology)', 'QUALITY of products', 'MARKETING management', 'SELF-management (Psychology)', 'WORKFLOW', 'SOCIAL influence', 'WOMEN employees', 'LABOR productivity', 'MOTIVATION (Psychology)', 'RISK', 'CAPITALISTS & financiers', 'OPTIONS (Finance)', 'EXECUTIVES -- Recruiting', 'HOSPITALS -- Administration', 'PRODUCTION management', 'PSYCHOMETRICS', 'CORPORATE culture', 'SUPPLY chains', 'DATA mining', 'PUNCTUATED equilibrium (Evolution)', 'HIGH technology industries', 'BOARDS of directors', 'MINORITY stockholders', 'WORK environment', 'SOCIAL networks', 'CONTAGION (Social psychology)', 'DECENTRALIZATION in management', 'QUALITY of work life', 'MARKETING', 'MANAGEMENT -- Employee participation', 'MANAGEMENT', 'EMPLOYEE loyalty', 'INDIVIDUAL differences', 'STOCKS (Finance) -- Prices', 'CONSOLIDATION & merger of corporations', 'SOCIAL exchange', 'MASS media', 'CORPORATIONS -- Valuation', 'INNOVATIONS in business', 'ORGANIZATIONAL goals', 'ORGANIZATIONAL research', 'ORGANIZATIONAL structure', 'INTERNATIONAL business enterprises', 'MUNICIPAL corporations', 'EMINENT domain', 'EMPLOYEE stock options', 'SOCIAL judgment theory (Communication)', 'JOB satisfaction', 'CRITICAL incident technique', 'EXECUTIVES -- Dismissal of', 'CORPORATIONS -- Investor relations', 'INNOVATION management', 'WORK environment -- Psychological aspects', 'FINANCIAL performance', 'CUSTOMER services', 'INTERORGANIZATIONAL networks', 'CONTINGENCY theory (Management)', 'EMPLOYEE ownership', 'FOREIGN investments', 'MOTION picture authorship', 'ENTREPRENEURSHIP', 'AMBIVALENCE', 'MARKETING -- Decision making', 'TASK analysis', 'SOCIAL context', 'HUMAN resource accounting', 'SOCIAL factors', 'PYGMALION (Greek mythology)', 'MATHEMATICAL statistics', 'RESOURCE management', 'WAGE payment systems', 'LABOR process', 'RESEARCH & development contracts', 'CUSTOMER satisfaction', 'UNITED States -- National Guard', 'PENSION trusts', 'STEWARDS', 'SELF-perception', 'SUPPLIERS', 'VIOLENCE in the workplace', 'MANAGEMENT research', 'EMPLOYEE recruitment', 'PRODUCT design', 'CAPITAL market', 'WOMEN -- Employment', 'EMPLOYEES -- Attitudes -- Research', 'CHARISMATIC authority', 'GALATEA, sea nymph (Greek deity)', 'CROSS-cultural differences', 'CORPORATIONS -- Public relations', 'SHIPBUILDING industry', 'RESOURCE-based theory of the firm', 'GROUP decision making', 'BUSINESS models', 'HIGH technology', 'STRATEGIC alliances (Business)', 'ANGER in the workplace', 'INTRINSIC motivation', 'PRODUCT lines', 'DELEGATION of authority', 'LABOR turnover', 'SELF-congruence', 'GOAL setting in personnel management', 'PUBLIC companies', 'BUSINESS enterprises -- Valuation', 'TECHNOLOGICAL innovations -- Economic aspects', 'HUMAN capital -- Management', 'INTELLECTUAL capital', 'PEER review (Professional performance)', 'RISK management in business', 'JOB qualifications', 'PRODUCT information management', 'MANAGEMENT styles', 'REWARD (Psychology)', 'OCCUPATIONAL roles', 'ERROR rates', 'TURNOVER (Business)', 'SUCCESS in business', 'DIVISION of labor', 'EMPLOYMENT in foreign countries', 'INDUSTRIAL efficiency', 'RESOURCE allocation', 'TECHNOLOGICAL innovations', 'PROBLEM employees', 'STRATEGIC business units', 'SUBSIDIARY corporations -- Management', 'FINANCIAL management', 'CUSTOMER orientation', 'TRANSACTION costs', 'INNOVATION adoption', 'WORK attitudes', 'HOST countries (Business)', 'MARKETING strategy']
In [10]:
df.shape[0]
Out[10]:
49
In [11]:
### Creating a mapping table to map each distinct keyword to an index
dic = {}
for i in range(0, len(keywords)):
  dic[i] = keywords[i]

dic
Out[11]:
{0: 'EQUITY',
 1: 'DIVERSIFICATION in industry',
 2: 'INTERORGANIZATIONAL relations',
 3: 'EXECUTIVE compensation',
 4: 'KNOWLEDGE management',
 5: 'EMOTIONS (Psychology)',
 6: 'SUPERVISORS',
 7: 'INDUSTRIAL relations',
 8: 'DECISION making',
 9: 'CORPORATE governance',
 10: 'EXECUTIVES',
 11: 'FAMILY-owned business enterprises',
 12: 'INSTITUTIONAL investors',
 13: 'RESEARCH & development',
 14: 'PROPERTY',
 15: 'STOCK options',
 16: 'MANAGEMENT science',
 17: 'AGGRESSION (Psychology)',
 18: 'CHIEF executive officers',
 19: 'MENTAL fatigue',
 20: 'PERSONNEL management',
 21: 'PRODUCT management',
 22: 'SOCIAL capital (Sociology)',
 23: 'ORGANIZATIONAL behavior',
 24: 'NEW products',
 25: 'LEADERSHIP',
 26: 'TEAMS in the workplace',
 27: 'LABOR supply',
 28: 'EMPLOYEES -- Attitudes',
 29: 'WORK & family',
 30: 'HUMAN capital',
 31: 'SOCIAL status',
 32: 'EMPLOYEE motivation',
 33: 'ORGANIZATIONAL change',
 34: 'CREATIVE ability',
 35: 'GOING public (Securities)',
 36: 'INTERNATIONAL business enterprises -- Management',
 37: 'COMPENSATION management',
 38: 'CROSS-functional teams',
 39: 'SERVICE industries -- Management',
 40: 'ORGANIZATIONAL sociology',
 41: 'BUSINESS planning',
 42: 'INTERGROUP relations',
 43: 'WAGES',
 44: 'INFORMATION resources management',
 45: 'INTERPERSONAL relations',
 46: 'JUSTICE',
 47: 'INDUSTRIAL management',
 48: 'STOCKHOLDERS wealth',
 49: 'DEBT',
 50: 'INVESTMENTS',
 51: 'PERFORMANCE',
 52: 'STOCKS (Finance)',
 53: 'INDUSTRIAL organization',
 54: 'VIOLENCE',
 55: 'PERSONNEL changes',
 56: 'JOB stress',
 57: 'INFRASTRUCTURE (Economics)',
 58: 'MULTILEVEL marketing',
 59: 'PERFORMANCE evaluation',
 60: 'LABOR organizing',
 61: 'GENEROSITY',
 62: 'EXECUTIVE ability (Management)',
 63: 'JOB performance',
 64: 'EMPLOYEE rules',
 65: 'TAIWANESE',
 66: 'CORPORATE image',
 67: 'CORPORATIONS -- Finance',
 68: 'FOREIGN subsidiaries -- Management',
 69: 'COMPETITIVE advantage',
 70: 'CUSTOMER relations',
 71: 'PERFORMANCE standards',
 72: 'BUSINESS communication',
 73: 'MANAGEMENT information systems',
 74: 'STRESS (Psychology)',
 75: 'CONFLICT management',
 76: 'DECISION theory',
 77: 'STOCK repurchasing',
 78: 'DIRECTORS of corporations',
 79: 'STOCKHOLDERS',
 80: 'SCREENWRITERS',
 81: 'SOCIAL psychology',
 82: 'SUCCESSION planning',
 83: 'INDUSTRIAL psychology',
 84: 'PROBLEM solving',
 85: 'VENTURE capital',
 86: 'ORGANIZATIONAL commitment',
 87: 'COMMERCIAL products',
 88: 'STRATEGIC planning',
 89: 'CRITICAL thinking',
 90: 'CONDUCT of life',
 91: 'CAPITAL investments',
 92: 'BEHAVIORAL research',
 93: 'HUMAN error',
 94: 'EMPLOYEES',
 95: 'STOCKHOLDERS -- Attitudes',
 96: 'INCENTIVES in industry',
 97: 'CREATIVE ability in business',
 98: 'EMPLOYEE selection',
 99: 'BUSINESS networks',
 100: 'GROUP identity',
 101: 'META-analysis',
 102: 'EMPLOYEES -- Rating of',
 103: 'LABOR economics',
 104: 'BREAK-even analysis',
 105: 'SOCIAL interaction',
 106: 'MEDIATION',
 107: 'AGENCY theory',
 108: 'GLOBALIZATION',
 109: 'BUSINESS enterprises',
 110: 'PROFIT',
 111: 'STOCK ownership',
 112: 'ORGANIZATIONAL effectiveness',
 113: 'ORGANIZATIONAL justice',
 114: 'EXECUTIVE succession',
 115: 'BURNOUT (Psychology)',
 116: 'QUALITY of products',
 117: 'MARKETING management',
 118: 'SELF-management (Psychology)',
 119: 'WORKFLOW',
 120: 'SOCIAL influence',
 121: 'WOMEN employees',
 122: 'LABOR productivity',
 123: 'MOTIVATION (Psychology)',
 124: 'RISK',
 125: 'CAPITALISTS & financiers',
 126: 'OPTIONS (Finance)',
 127: 'EXECUTIVES -- Recruiting',
 128: 'HOSPITALS -- Administration',
 129: 'PRODUCTION management',
 130: 'PSYCHOMETRICS',
 131: 'CORPORATE culture',
 132: 'SUPPLY chains',
 133: 'DATA mining',
 134: 'PUNCTUATED equilibrium (Evolution)',
 135: 'HIGH technology industries',
 136: 'BOARDS of directors',
 137: 'MINORITY stockholders',
 138: 'WORK environment',
 139: 'SOCIAL networks',
 140: 'CONTAGION (Social psychology)',
 141: 'DECENTRALIZATION in management',
 142: 'QUALITY of work life',
 143: 'MARKETING',
 144: 'MANAGEMENT -- Employee participation',
 145: 'MANAGEMENT',
 146: 'EMPLOYEE loyalty',
 147: 'INDIVIDUAL differences',
 148: 'STOCKS (Finance) -- Prices',
 149: 'CONSOLIDATION & merger of corporations',
 150: 'SOCIAL exchange',
 151: 'MASS media',
 152: 'CORPORATIONS -- Valuation',
 153: 'INNOVATIONS in business',
 154: 'ORGANIZATIONAL goals',
 155: 'ORGANIZATIONAL research',
 156: 'ORGANIZATIONAL structure',
 157: 'INTERNATIONAL business enterprises',
 158: 'MUNICIPAL corporations',
 159: 'EMINENT domain',
 160: 'EMPLOYEE stock options',
 161: 'SOCIAL judgment theory (Communication)',
 162: 'JOB satisfaction',
 163: 'CRITICAL incident technique',
 164: 'EXECUTIVES -- Dismissal of',
 165: 'CORPORATIONS -- Investor relations',
 166: 'INNOVATION management',
 167: 'WORK environment -- Psychological aspects',
 168: 'FINANCIAL performance',
 169: 'CUSTOMER services',
 170: 'INTERORGANIZATIONAL networks',
 171: 'CONTINGENCY theory (Management)',
 172: 'EMPLOYEE ownership',
 173: 'FOREIGN investments',
 174: 'MOTION picture authorship',
 175: 'ENTREPRENEURSHIP',
 176: 'AMBIVALENCE',
 177: 'MARKETING -- Decision making',
 178: 'TASK analysis',
 179: 'SOCIAL context',
 180: 'HUMAN resource accounting',
 181: 'SOCIAL factors',
 182: 'PYGMALION (Greek mythology)',
 183: 'MATHEMATICAL statistics',
 184: 'RESOURCE management',
 185: 'WAGE payment systems',
 186: 'LABOR process',
 187: 'RESEARCH & development contracts',
 188: 'CUSTOMER satisfaction',
 189: 'UNITED States -- National Guard',
 190: 'PENSION trusts',
 191: 'STEWARDS',
 192: 'SELF-perception',
 193: 'SUPPLIERS',
 194: 'VIOLENCE in the workplace',
 195: 'MANAGEMENT research',
 196: 'EMPLOYEE recruitment',
 197: 'PRODUCT design',
 198: 'CAPITAL market',
 199: 'WOMEN -- Employment',
 200: 'EMPLOYEES -- Attitudes -- Research',
 201: 'CHARISMATIC authority',
 202: 'GALATEA, sea nymph (Greek deity)',
 203: 'CROSS-cultural differences',
 204: 'CORPORATIONS -- Public relations',
 205: 'SHIPBUILDING industry',
 206: 'RESOURCE-based theory of the firm',
 207: 'GROUP decision making',
 208: 'BUSINESS models',
 209: 'HIGH technology',
 210: 'STRATEGIC alliances (Business)',
 211: 'ANGER in the workplace',
 212: 'INTRINSIC motivation',
 213: 'PRODUCT lines',
 214: 'DELEGATION of authority',
 215: 'LABOR turnover',
 216: 'SELF-congruence',
 217: 'GOAL setting in personnel management',
 218: 'PUBLIC companies',
 219: 'BUSINESS enterprises -- Valuation',
 220: 'TECHNOLOGICAL innovations -- Economic aspects',
 221: 'HUMAN capital -- Management',
 222: 'INTELLECTUAL capital',
 223: 'PEER review (Professional performance)',
 224: 'RISK management in business',
 225: 'JOB qualifications',
 226: 'PRODUCT information management',
 227: 'MANAGEMENT styles',
 228: 'REWARD (Psychology)',
 229: 'OCCUPATIONAL roles',
 230: 'ERROR rates',
 231: 'TURNOVER (Business)',
 232: 'SUCCESS in business',
 233: 'DIVISION of labor',
 234: 'EMPLOYMENT in foreign countries',
 235: 'INDUSTRIAL efficiency',
 236: 'RESOURCE allocation',
 237: 'TECHNOLOGICAL innovations',
 238: 'PROBLEM employees',
 239: 'STRATEGIC business units',
 240: 'SUBSIDIARY corporations -- Management',
 241: 'FINANCIAL management',
 242: 'CUSTOMER orientation',
 243: 'TRANSACTION costs',
 244: 'INNOVATION adoption',
 245: 'WORK attitudes',
 246: 'HOST countries (Business)',
 247: 'MARKETING strategy'}
In [12]:
### Creating a 2d numpy array of zeroes and dimension NxN (N: no. of distinct keywords)
import numpy as np
adj_matrix = np.zeros([len(keywords), len(keywords)], dtype = int)
In [13]:
### Creating the adjacency matrix and a dataframe to store all keyword pairs
df_map = pd.DataFrame({'from':[], 'to':[], 'weight':[]})
for index, row in df.iterrows():
  for i in range(1,df.shape[1]):
    # Skipping the iteration if keyword = 'NA' is encountered
    if row[f'Keyword {i}']=='NA':
      continue
    for j in range(1,df.shape[1]):
      # As a keyword can't be linked to itself, skipping the iteration
      if row[f'Keyword {i}']==row[f'Keyword {j}']:
        continue
      # Skipping the iteration when the other keyword in the pair is 'NA'
      elif row[f'Keyword {j}']=='NA':
        continue
      else:
        df_map.loc[len(df_map.index)] = [row[f'Keyword {i}'], row[f'Keyword {j}'], 1]
        # Adding weight (+1) in the matrix when a pair exists
        adj_matrix[list(dic.keys())[list(dic.values()).index(row[f'Keyword {i}'])]][list(dic.keys())[list(dic.values()).index(row[f'Keyword {j}'])]]=adj_matrix[list(dic.keys())[list(dic.values()).index(row[f'Keyword {i}'])]][list(dic.keys())[list(dic.values()).index(row[f'Keyword {j}'])]]+1
In [14]:
df_map.head(10)
Out[14]:
from to weight
0 EQUITY ORGANIZATIONAL sociology 1.0
1 EQUITY PERFORMANCE 1.0
2 EQUITY META-analysis 1.0
3 EQUITY PSYCHOMETRICS 1.0
4 EQUITY ORGANIZATIONAL research 1.0
5 EQUITY FINANCIAL performance 1.0
6 EQUITY AGENCY theory 1.0
7 EQUITY ORGANIZATIONAL effectiveness 1.0
8 EQUITY ORGANIZATIONAL behavior 1.0
9 EQUITY CORPORATE governance 1.0

Weighted adjacency matrix

In [15]:
import sys

### This setting can be uncommented to see the entire adjacency matrix
# np.set_printoptions(threshold=sys.maxsize)
adj_matrix
Out[15]:
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])
In [16]:
# Converting the adjacency matrix to a dataframe
df_plot = pd.DataFrame(data=adj_matrix,
                       index=keywords,
                       columns=keywords)
In [17]:
df_plot
Out[17]:
EQUITY DIVERSIFICATION in industry INTERORGANIZATIONAL relations EXECUTIVE compensation KNOWLEDGE management EMOTIONS (Psychology) SUPERVISORS INDUSTRIAL relations DECISION making CORPORATE governance ... PROBLEM employees STRATEGIC business units SUBSIDIARY corporations -- Management FINANCIAL management CUSTOMER orientation TRANSACTION costs INNOVATION adoption WORK attitudes HOST countries (Business) MARKETING strategy
EQUITY 0 0 0 0 0 0 0 0 0 1 ... 0 0 0 0 0 0 0 0 0 0
DIVERSIFICATION in industry 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 1 0 0 0
INTERORGANIZATIONAL relations 0 0 0 0 1 0 0 0 0 1 ... 0 0 0 0 0 0 0 0 0 0
EXECUTIVE compensation 0 0 0 0 0 0 0 0 2 2 ... 0 0 0 0 0 0 0 0 0 0
KNOWLEDGE management 0 0 1 0 0 0 0 0 1 2 ... 0 0 0 0 0 1 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
TRANSACTION costs 0 0 0 0 1 0 0 0 1 1 ... 0 0 0 0 0 0 0 0 0 0
INNOVATION adoption 0 1 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
WORK attitudes 0 0 0 0 0 0 0 1 0 0 ... 1 0 0 0 0 0 0 0 0 0
HOST countries (Business) 0 0 0 0 0 0 0 0 0 0 ... 0 0 1 0 0 0 0 0 0 0
MARKETING strategy 0 0 0 0 0 0 0 0 1 0 ... 0 0 0 0 1 0 0 0 0 0

248 rows × 248 columns

Plotting the network diagram according to the adjacency table

In [18]:
import networkx as nx
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(100, 100))

def make_label_dict(labels):
    l = {}
    for i, label in enumerate(labels):
        l[i] = label
    return l

labels=make_label_dict(keywords)

G = nx.Graph(df_plot.values)
edge_labels = dict( ((u, v), d["weight"]) for u, v, d in G.edges(data=True) )
pos = nx.spring_layout(G)
nx.draw(G, pos)
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)
nx.draw(G, pos, node_size=1000, labels=labels, with_labels=True, font_color="black", node_color = "orange", edge_color = "grey")
plt.show()

Computing node strength and degree

In [19]:
### As strength is the sum of weights, summing the column values corresponding to the node (index)
df_plot["node_strength"]=list(df_plot.sum(axis=1))

### Counting the number of columns where a link exists to determine the node degree
df_plot["node_degree"] = list((df_plot.iloc[:,:-1] != 0).sum(1))
df_plot["node"] = list(df_plot.index)

### Creating a df with node name, degree and strength
df_node_info = df_plot[["node","node_degree","node_strength"]]
df_node_info = df_node_info.reset_index()
df_node_info = df_node_info.drop(['index'], axis=1)
df_node_info.head(10)
Out[19]:
node node_degree node_strength
0 EQUITY 10 10
1 DIVERSIFICATION in industry 18 20
2 INTERORGANIZATIONAL relations 24 27
3 EXECUTIVE compensation 31 36
4 KNOWLEDGE management 19 20
5 EMOTIONS (Psychology) 9 9
6 SUPERVISORS 10 10
7 INDUSTRIAL relations 49 59
8 DECISION making 90 112
9 CORPORATE governance 62 85

Computing weight for each node pair

In [20]:
### Using the created nodepair dataframe and separating keywords by :
### The pairs are undirected in nature - AB is the same as BA
df_map['node_pair'] = df_map.iloc[:,:-1].apply(lambda x: ' : '.join(sorted(x)), axis=1)
df_map.head()
Out[20]:
from to weight node_pair
0 EQUITY ORGANIZATIONAL sociology 1.0 EQUITY : ORGANIZATIONAL sociology
1 EQUITY PERFORMANCE 1.0 EQUITY : PERFORMANCE
2 EQUITY META-analysis 1.0 EQUITY : META-analysis
3 EQUITY PSYCHOMETRICS 1.0 EQUITY : PSYCHOMETRICS
4 EQUITY ORGANIZATIONAL research 1.0 EQUITY : ORGANIZATIONAL research
In [21]:
### Count by node_pair gives the weight
df_node_pair_weight = df_map.groupby(["node_pair"], as_index=False)['weight'].sum()

### dividing the weight by 2 as grouping by undirected node pair results in the weight being summed twice - similar to how the matrix is mirrored along the diagonal
df_node_pair_weight['weight'] = df_node_pair_weight['weight']/2
df_node_pair_weight.head()
Out[21]:
node_pair weight
0 AGENCY theory : BOARDS of directors 1.0
1 AGENCY theory : CORPORATE governance 3.0
2 AGENCY theory : CORPORATIONS -- Finance 1.0
3 AGENCY theory : DEBT 1.0
4 AGENCY theory : DECISION making 1.0

Ranking nodes/node pairs by degree, strength and weight

In [22]:
### Creating rank by node strength - using dense rank to assign ranks
df_node_info['node_strength_rank'] = df_node_info.node_strength.rank(method='dense', ascending=False).astype(int)
### Creating rank by node degree
df_node_info['node_degree_rank'] = df_node_info.node_degree.rank(method='dense', ascending=False).astype(int)
df_node_info.head()
Out[22]:
node node_degree node_strength node_strength_rank node_degree_rank
0 EQUITY 10 10 43 37
1 DIVERSIFICATION in industry 18 20 35 31
2 INTERORGANIZATIONAL relations 24 27 30 26
3 EXECUTIVE compensation 31 36 23 20
4 KNOWLEDGE management 19 20 35 30
In [23]:
### Creating rank by weight for each node pair
df_node_pair_weight['weight_rank'] = df_node_pair_weight.weight.rank(method='dense', ascending=False).astype(int)
df_node_pair_weight.head()
Out[23]:
node_pair weight weight_rank
0 AGENCY theory : BOARDS of directors 1.0 10
1 AGENCY theory : CORPORATE governance 3.0 8
2 AGENCY theory : CORPORATIONS -- Finance 1.0 10
3 AGENCY theory : DEBT 1.0 10
4 AGENCY theory : DECISION making 1.0 10

Getting the top 10 nodes by degree and strength

By degree

In [24]:
df_node_info[df_node_info["node_degree_rank"]<=10].sort_values("node_degree_rank")
Out[24]:
node node_degree node_strength node_strength_rank node_degree_rank
23 ORGANIZATIONAL behavior 166 265 1 1
112 ORGANIZATIONAL effectiveness 104 144 2 2
16 MANAGEMENT science 102 136 3 3
20 PERSONNEL management 93 126 4 4
8 DECISION making 90 112 5 5
156 ORGANIZATIONAL structure 74 107 6 6
40 ORGANIZATIONAL sociology 66 96 7 7
88 STRATEGIC planning 66 80 10 7
47 INDUSTRIAL management 64 84 9 8
9 CORPORATE governance 62 85 8 9
26 TEAMS in the workplace 55 78 11 10

By strength

In [25]:
df_node_info[df_node_info["node_strength_rank"]<=10].sort_values("node_strength_rank")
Out[25]:
node node_degree node_strength node_strength_rank node_degree_rank
23 ORGANIZATIONAL behavior 166 265 1 1
112 ORGANIZATIONAL effectiveness 104 144 2 2
16 MANAGEMENT science 102 136 3 3
20 PERSONNEL management 93 126 4 4
8 DECISION making 90 112 5 5
156 ORGANIZATIONAL structure 74 107 6 6
40 ORGANIZATIONAL sociology 66 96 7 7
9 CORPORATE governance 62 85 8 9
47 INDUSTRIAL management 64 84 9 8
88 STRATEGIC planning 66 80 10 7

Getting top 10 node pairs by weight

In [26]:
### As node pairs that have the same weight are given the same rank, and rank 10 corresponds to weight = 1, all pairs are shown
df_node_pair_weight[df_node_pair_weight["weight_rank"]<=10].sort_values("weight_rank")
Out[26]:
node_pair weight weight_rank
1796 ORGANIZATIONAL behavior : ORGANIZATIONAL effec... 11.0 1
1800 ORGANIZATIONAL behavior : ORGANIZATIONAL struc... 9.0 2
1802 ORGANIZATIONAL behavior : PERSONNEL management 8.0 3
1648 MANAGEMENT science : ORGANIZATIONAL behavior 7.0 4
1799 ORGANIZATIONAL behavior : ORGANIZATIONAL socio... 6.0 5
... ... ... ...
735 DECISION making : WORKFLOW 1.0 10
734 DECISION making : UNITED States -- National Guard 1.0 10
733 DECISION making : TRANSACTION costs 1.0 10
835 EMPLOYEE ownership : FAMILY-owned business ent... 1.0 10
2140 WORK attitudes : WORK environment 1.0 10

2141 rows × 3 columns

In [27]:
### Sorting by weight in descending order and displaying the first 10 can also be done to take a look at the largest node pairs
### NOTE: This however only displays top 10 rows and equal importance is not given to node pairs having same weight
df_node_pair_weight.nlargest(10,'weight')
Out[27]:
node_pair weight weight_rank
1796 ORGANIZATIONAL behavior : ORGANIZATIONAL effec... 11.0 1
1800 ORGANIZATIONAL behavior : ORGANIZATIONAL struc... 9.0 2
1802 ORGANIZATIONAL behavior : PERSONNEL management 8.0 3
1648 MANAGEMENT science : ORGANIZATIONAL behavior 7.0 4
437 CORPORATE governance : ORGANIZATIONAL behavior 6.0 5
704 DECISION making : ORGANIZATIONAL behavior 6.0 5
1799 ORGANIZATIONAL behavior : ORGANIZATIONAL socio... 6.0 5
1881 ORGANIZATIONAL effectiveness : ORGANIZATIONAL ... 6.0 5
1279 INDUSTRIAL management : ORGANIZATIONAL behavior 5.0 6
1352 INDUSTRIAL relations : ORGANIZATIONAL behavior 5.0 6

Plotting average strength vs degree

In [28]:
df_node_info.head()
df_scatter_plot = df_node_info.groupby(['node_degree'], as_index=False)['node_strength'].mean()
df_scatter_plot = df_scatter_plot.rename(columns={"node_strength": "avg_node_strength"})
df_scatter_plot.head()
Out[28]:
node_degree avg_node_strength
0 4 4.0
1 7 7.0
2 8 8.0
3 9 9.0
4 10 10.0
In [29]:
import seaborn as sns
import matplotlib.pyplot as plt

sns.set(style='whitegrid')
fig = plt.gcf()

# Change seaborn plot size
fig.set_size_inches(12, 8)

p=sns.scatterplot(x='node_degree', 
                y='avg_node_strength', 
                data=df_scatter_plot)

p.set_xlabel("Node Degree", fontsize = 12)
p.set_ylabel("Average Strength", fontsize = 12)
plt.title("Average Strength Vs Degree", size = 16)
Out[29]:
Text(0.5, 1.0, 'Average Strength Vs Degree')

Converting to HTML

In [29]:
!jupyter nbconvert --to html Project_3_Task_1_Group_76.ipynb